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We consider the problem of determining the weights of a quantum ensemble. That is to say, given 
a quantum system that is in a set of possible known states according to an unknown probability 
law, we give strategies to estimate the individual probabilities, weights, or mixing proportions. 
Such strategies can be used to estimate the frequencies at which different independent signals are 
emitted by a source. They can also be used to estimate the weights of particular terms in a 
canonical decomposition of a quantum channel. The quality of these strategies is quantified by a 
covariance-type error matrix. According with this cost function, we give optimal strategies in both 
the single-shot and multiple-copy scenarios. The latter is also analyzed in the asymptotic limit of 
large number of copies. We give closed expressions of the error matrix for two-component quantum 
mixtures of qubit systems. The Fisher information plays an unusual role in the problem at hand, 
providing exact expressions of the minimum covariance matrix for any number of copies. 

PACS numbers; 03.67.Hk, 03.65.Ta, 03.65.Wj 



I. INTRODUCTION 

Suppose we are given a quantum system which is 
known to be in one of several states with some unknown 
probability, such as a photon that travels through a com- 
munication channel and codifies some message. These 
states can be non-orthogonal due to, e.g., errors occur- 
ring during the transmission, but can also be made to 
overlap intentionally, e.g., to avoid possible eavesdropper 
attacks in quantum key distribution. Given this set of 
possible fixed states, we wish to find an estimate of the 
probabilities that best describe the state we have been 
provided with. More succinctly, assuming that a state p\ 
is a convex combination of a given set of states {pr}^ 

M 

PX^^KPr, (1) 

we wish to best estimate the value of the weights {A^}, 
which we arrange in the column vector A and characterize 
the quantum ensemble {{Xr, Pr)}, by performing suitable 
measurements on the system. 

The analogous classical problem appears in the field 
of statistical modeling under the name of estimation of 
finite mixtures The formal study of finite mixtures 
was initiated by Pearson in 1894 [2J. He was conducting a 
biometric investigation on data collected from crabs, and 
found that the distribution of the size of their forehead 
(relative to the size of the body) presented an unexpected 
skewness, which could not be modeled with a symmet- 
ric normal distribution. Pearson showed that the data 
was very well fitted by a mixture of two normal distri- 
butions. The presence of two components was taken by 
Pearson as evidence that there were two different species 
of crabs. In this way finite mixture models can be used 
to expose any grouping in underlying data (clustering 
of data). With the prior knowledge on the individual 
component densities, which can be inferred or estimated 
by other means, finite mixture estimation enables one to 



estimate the weights, or proportions, of the different pop- 
ulations from the gathered coarse-grained data. A (clas- 
sical) finite mixture, px{i) — J^'^rPri'i) (in the obvious 
notation), can thus always be interpreted as describing 
situations where the information on the grouping is lost, 
or in other words, as marginals of a joint distribution 
p{i,r), such that px(i) = i-^-' Pr{^) can be 

viewed as the conditioned probability Pr{i) = pi'i-l'r). 

In this paper we approach the problem of estimating 
quantum finite mixtures. More precisely, we give opti- 
mal strategies to estimate the vector of weights A under 
the assumptions given above (known set {pr} of possible 
states). We address also the situation in which we are 
provided with N identical and independent copies of the 
state px, to which we will refer as average state. In this 
multiple-copy scenario, we further assume that general- 
ized collective measurements can be performed on ^. 
For large N, we also give (local) strategies based on pro- 
jective measurements on individual copies that have the 
same performance as the optimal collective strategies. 

Quantum ensembles are necessary to describe situa- 
tions in which complete prior information is lacking. In 
the context of quantum communication, for instance, one 
estimates the frequency of different (known) states com- 
ing out of a source, i.e., one gathers information from the 
average state in connection to its particular preparation 
procedure. It is well known that in general there is no 
unique quantum ensemble consistent with a given mixed 
state . Therefore there will be instances in quantum fi- 
nite mixture estimation, called unidentifiable, where the 
average state px does not fully determine the value of 
the weights A, which therefore cannot be estimated with 
unlimited precision even when an arbitrary number of 
copies of Px is provided. This problem is related to that 
of discrimination of quantum ensembles [4], where it is 
necessary to consider as inequivalent the different ensem- 
bles that are consistent with a given mixed state. We also 
note that, as in the classical quantum finite mix- 

ture can be interpreted as the marginal density matrix of 
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an extended system-ancilla state when a particular mea- 
surement is done on the ancilla. A quantum ensemble 
also describes the output of a stochastic quantum chan- 
nel (or generalized measurement) for a fixed input state. 
In particular, if the input state is taken to be one part 
of a bipartite maximally entangled state, the stochastic 
channel is fully characterized by the output state, and it 
can be interpreted as a quantum finite mixture. There- 
fore, the results that we present here can be applied to 
the estimation of the weights of the individual (or of a 
sub-set of) Kraus operators in a particular operator sum 
representation of a channel. For example, we can easily 
give bounds on the precision of estimating the weight of 
bit flip, phase flip, and combined bit-phase flip errors, or 
also the total weight of 2-qubit Pauli errors versus single 
qubit Pauli errors. 

Quantum finite mixture estimation is a novel ground 
for quantum estimation theory 5-8], which is one of the 
basic tools in the field of quantum information and has 
been continuously developing since the late 70's. Many 
problems have been addressed, ranging from the esti- 
mation of a single parameter — as, e.g., a phase [8], or 
the losses of a quantum channel Q — to full tomogra- 
phy. Quantum estimation theor y fi nds also many appli- 
cations in quantum metrolo gy — such as improve- 
ment of frequency standards , gravitational- wave de- 
tection and clock synchronization [13, [l^ — and 
it is often a key ingredient in other quantum computa- 
tion and communication topics, e.g., quantum bench- 
marks for teleportation experiments [l7[. The recent 
problem studied by Konrad et al. [l^ can be viewed as 
a quantum finite mixture estimation in a simplified con- 
text. In the present paper we address the issue in full 
generality. This, in passing, will enable us to answer 
most of the questions posed there. 

The paper is organized as follows. In Section [II] we in- 
troduce the general framework and give the main results 
for both, the single- and multiple-copy scenarios. The 
asymptotic limit of large number of copies is addressed 
in Section Hill The two sections conclude with a discus- 
sion on unidentifiability of mixtures and its consequences. 
Additionally, in each of these sections, we provide exam- 
ples to illustrate the use of the techniques that we intro- 
duce. Section HVl is devoted to two-component mixtures, 
where closed expressions can be given for rather general 
situations. The conclusions are in Section IVl and several 
technical details can be found in the appendixes, which 
also include an example of a two-step adaptive local strat- 
egy that is optimal. 



II. ESTIMATION OF WEIGHTS IN FINITE 
MIXTURES 

A. General framework 

As already mentioned in the introduction, a quantum 
finite mixture is defined to be the convex combination in 



Eq. (HI, where A belongs to the unit (M— l)-simplex (i.e., 
the set {A : Ar > 0,^^^^^ \r — 1}). By quantum finite 
mixture estimation we mean the following: assume we 
have been provided with a copy of the average state px 
(or with several identical and independent copies of it; 
i.e., with pff^), of which we know nothing about the ac- 
tual value of A but that it has been drawn from a (prior) 
probability distribution 7r(A). Assume also that we are 
allowed to perform generalized measurements on the copy 
(or copies) of p\. Our task is to determine A (or, maybe, 
some linear combinations of its components A^; namely, 
a — a* A, where a is some vector of constants a^). This 
has necessarily to be based on the output (s) of our mea- 
surement (s) on Px (pf ^). Due to the inherent nature of 
quantum measurements, the determination of A cannot 
be perfect and we can only hope to obtain an estimate 
within some accuracy. Our goal is to obtain the best 
estimate. 

To give a precise meaning to the term 'best estimate' 
we take a Bayesian approach and introduce as cost func- 
tion the covariance-type error matrix 

A = ((A-A^)(A-A^)*), (2) 

where is our estimate of A based on the outcome x 
of our measurement and ( • ) stands for averaging over A 
and X- More precisely, the averaging is performed over 
the joint probability distribution p(x. A) = p(x|A)7r(A), 
where p(x|A) is the probability of obtaining the out- 
come X conditioned to the actual value of A. In Quan- 
tum Mechanics this conditional probability is given by 
Born's rule: p(x|A) — tr E^px, where {E^} is the pos- 
itive operator-valued measure (POVM) that defines our 
generalized quantum measurement. The trace of the er- 
ror matrix A gives the total mean square error (MSE), 
E = trA, while the expectation value Ea = a*Aa gives 
the mean square error in the estimation of a. 

In order to analyse one-copy and multiple-copy estima- 
tion in a unified framework we have found it convenient 
to define quantum finite mixtures, Eq. ([1]), in a slightly 
more general form, allowing for non-linear mixtures of 
the type 

PX^^Ca{X)pa, (3) 

a 

where the coefficient functions satisfy (A) = 1 for 

all A [but not necessarily Cq(A) > 0], and the range of 
values for a may not coincide with that for r in ([1]). As 
for linear finite mixtures, our goal still is to best esti- 
mate A (we assume that the functional dependence of 
the coefficient functions Cq. on A is known). 
The error matrix A can be written as 

A = 5]p(x)((A-A^)(A-AJ\, (4) 

X 

where p{x) is the marginal of p(x,A) and ( • )^ indi- 
cates averaging over the conditional probability p(A|x) = 
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A)/p(x) (Bayes rule). More explicitly, p{x) = 
J dXp{x, A), where we use the shorthand notation dX — 
S {J2r^r — ^)Y\r'^^r- Note that the Dirac i5-function, 
along with A,. > 0, guarantees that A is a point in the unit 
(A/ — l)-simplex (hereafter, simplex for brevity). Eq. (|4]) 
can be cast as 



(5) 



with 5^ 



Note that all dependence on 



our particular choice of the estimator is contained 
in S^. Since the matrix ^x^x manifestly positive 
semi-definite, the estimator that minimizes our cost func- 
tion A is 

rfAp(A|x)A^ . 6 

J dXn[X)tTE^px 

(Note that the components of A^ are non-negative and 
add up to one; i.e., A^ is a probability vector) Hereafter, 
we will only consider this optimal estimator, which gives 
the smallest error matrix. We will denote this matrix by 
the same symbol A to simplify the notation. Hence, we 
may write 



A = ^P(X)A, 



(7) 



By rearranging the remaining terms in ([5]) one can further 
simplify the expression of the error matrix to obtain 



A={XX')-J2pix){X)x{\\, 



(8) 



where it is important to note that the first average is 
over the prior distribution 7r(A) alone, i.e., independent of 
the measurements we may perform on the average state. 
As to the second term, we may write the average value 
of A as 



p{x} 



(9) 



where we have defined — (Acq,(A)) — (A)(cq(A)) and 
used that ^^(cQ(A))trii'^PQ, = p{x)- Inserting this result 
in we find. 



A = A E 



pix) 



, (10) 



where A — {XX*) — (A) (A*) is the covariance matrix of 
the unknown weights, i.e., its elements are the second 
order moments of the prior distribution 7r(A). In order 
to interpret the second term in this equation, we define 
an effective state a\ that combines information relative 
to the prior distribution of A with the quantum states p^ : 



<Jx = (Px) + (A - A)* 




(11) 



where we have defined A = (A). It is shown in Ap- 
pendix]^ that this equation defines a proper density ma- 
trix. Let p\{x) be the probability distribution of the 
outcomes obtained when performing the POVM mea- 
surement {E^} on this effective state, namely pxix) — 
trE^ax. Then, Ec^. (jlOp can be written in a very appeal- 
ing form as 



A = A - F(A), 



(12) 



where F{X) is the Fisher information matrix of the prob- 
ability distribution Pa (x) I whose elements are defined by 



Fr 



^(A)-E 



drPxix)dsPx{x) 
Pxix) 



(13) 



and we use the compact notation dr = d/dXr- Some 
comments are in order. Note that the error matrix A 
has two distinct contributions: i) the intrinsic 'error' of 
the random variable A (that one would obtained by just 
guessing the weights of the quantum finite mixture with- 
out performing any measurement whatsoever), which is 
given by the covariance matrix A; and ii) the Fisher In- 
formation of the effective state ax, which represents the 
information gathered from the outcomes of the measure- 
ment on the average state px- Naturally, this informa- 
tion reduces the uncertainty on the actual value of A, 
which explains the minus sign in (|12p . Despite this very 
natural interpretation, one might be somehow surprised 
to find the Fisher Information matrix in the context of 
Eq. (|12[) . It usually appears in connection to the Cramer- 
Rao bound (see Sec. lIII Al below). where it provides lower 
bounds to the MSE in estimation problems. Typically 
these lower bounds are attained only in the asymptotic 
limit of many identical and independent copies. Note 
however that relation (jl2p is an exact expression. 

More interestingly for our purposes here, relation 
enables us to apply known results [1, 0] concerning the 
Fisher Information. In particular, the Braunstein and 
Caves inequality [l^, which states that the Fisher In- 
formation is upper bounded by the so-called Quantum 
Fisher Information (QFI) matrix H{X). Thus, 



A >A-H{X). 



(14) 



Before proceeding, we recall the definition of H{X). Its 
matrix elements, which depend only on the family of 
states ax, are given by 



Hrs{X)^ntr [Lr{X)L,{X)ax] 



(15) 



where the matrix Lj.{X) is the Symmetric Logarithmic 
Derivative (SLD), (impHcitly) defined as 

i [Lr{X)ax + axLr{X)] = drax- (16) 

Although Eqs. and are particularized to the case 
under consideration, they also apply to a general situa- 
tion where ax represents an arbitrary family of states. 
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such as that defined by px. We also recall that the SLD 
is most easily computed in the basis that diagonalizes ax . 
A simple calculation leads to 



L{X) = 2J2- 



J2a ^aPa |0r, 



l0«)('/'ri 



(17) 



where {|(/'n)} are the eigenvectors and eigenvalues 

of (j{X) = {px) respectively. 

Let us go back to Eq. (fT4|) . Since H{X) is indepen- 
dent of the measurement (as pointed out above, it only 
depends on the effective state crx), Eq. pi]) provides an 
absolute lower bound to the error matrix A. 

In those cases where this lower bound is attainable 
[such as the dimension two case, where A = (A, 1 — A)*, 
or when the SLD matrices commute with one another], 
the QFI matrix further provides us with the optimal 
measurement. In those cases {E^} can be chosen to 
be the projectors onto the eigenspaces of Lr{X)- An 
important instance is the estimation of the linear com- 
bination a = a* A. In this case the optimal measure- 
ment is given by the projector onto the eigenspaces of 
La = ^^arir(A), and the minimal error Ea is exactly 
given by 



E,. 



a*Aa- 2^ 



I J2c Po 



(18) 



which comes from sandwiching Eq. (|12|) with a* and a. 
In particular, the MSE on a single weight A^ is given by 



71. m 



(19) 



Quantum finite mixtures of orthogonal states {paPp = 
0) is yet another instance where the bound (|14|) is attain- 
able. In this case, one can easily check that the MSE is 
simply given by 



E' 



tr A = trA 



EA„ Aq 
„ MA))- 



(20) 



we note that the state pf^ can be written in the form ([3]) 
with k playing the role of a and Cfe(A) = N\ JJ^ X'^'' /krl, 
Pk = Sipf''^ (g) . . . (8) pff"). Because of this, the results 
of the previous section can be applied to multiple copies. 

For arbitrary prior distributions 7r(A) that is about all 
we can say concerning the multiple copy scenario. How- 
ever, more explicit expression can be derived if a fiat 
distribution of weights can be assumed. This is the most 
conservative scenario, and also the situation when noth- 
ing is known a priori about the weights A. Appendix [B] 
collects useful formulae for computing integrals and av- 
erages on the simplex when 7r(A) is fiat (constant). From 
this appendix one can easily obtain 



A-kr — 



Srs - II M 

M{M + 1) ' 

kr - N/M 

{N + M) 



Cfe(A)), 



(22) 
(23) 



for the matrix elements of A and A^ respectively, where 

-1 



(cfc(A)) 



N + M -I 
N 



Hence, the lower bound on the MSE follows: 



M -I „ 

trA > — - 2 > ■ 

- MM + 1 ^ 



Efc^fe Pk \(t>-n 



(24) 



(25) 



This is as far as one can get for mixtures of arbitrary 
states {pr}- In the case of mixtures of orthogonal states 
we can substitute Eqs. ^ to ([M]) in Eq. (HO]) and find 
a closed expression for the MSE for multiple copies: 



E 



N 



trA 



M -1 



(M+ l)(M + iV)' 



(26) 



where we have used the summation formula in Ap- 
pendix [BJ Note that the error Ej^ vanishes as N goes 
to infinity. 



C. Identifiability 



B. Estimation with multiple copies 

Let us assume that we are given an arbitrary num- 
ber N of identical and independent copies of the average 
state PA in ([T]). The global state of the N copies can be 
written as 



A, 



k r 



'ptri (21) 



where the components of the 'occupation number' vec- 
tor k satisfy X^rli = N, and iS indicates averag- 
ing over all permutations of the N copies, which pro- 
duces a proper (normalized) state. From this equation 



A mixture is identifiable if there exists a one-to-one 
correspondence between A and px- That is to say, iff 
given Px, there is no other vector of weights A satis- 
fying Eq. ([l}. In a general situation, though, differ- 
ent vectors A can give rise to the same density matrix 
{px = Px' for some A 7^ A') and, therefore, identifia- 
bility cannot be taken for granted. Necessary and suffi- 
cient conditions for identifiability of classical finite mix- 
tures, px{i) = J2 ^rPr{i), were established more than 
four decades ago by Teicher 0. These conditions are 
equivalent to {pr}^i being a linearly independent set. 
Similarly, the linear independence of the (density) ma- 
trixes in a quantum ensemble {pr}rLi, constitutes a nec- 
essary and sufficient condition for the identifiability of 
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quantum finite mixtures: states lying in the convex hull 
of a linearly independent set of density matrixes will be 
identifiable, while all states in the convex hull of a linearly 
dependent set will necessarily be unidentifiable, except 
for possibly some states on the boundary. 

Identifiability is usually assumed in (classical) mixture 
estimation (see e. g. [^T]), since unidentifiable models of- 
ten give rise to ill-defined estimation procedures and their 
asymptotic theories break down. In contrast, our ap- 
proach leads to sensible results for the estimation of quan- 
tum finite mixtures even in unidentifiable scenarios. The 
above results for single-copy case, as well as the deriva- 
tion of the effective model for finite number of copies, can 
be directly applied without taking notice of identifiabil- 
ity considerations. Care must be taken, however, when 
applying the asymptotic methods of the next section to 
unidentifiable mixtures. Such methods assume that the 
errors go to zero as the number of copies increases, which 
cannot be guaranteed if mixtures are unidentifiable. We 
will revisit unidentifiability at the end of Sec. IIIIl where 
we will introduce ways to circumvent this difficulty. 



III. ESTIMATION OF WEIGHTS IN THE 
ASYMPTOTIC LIMIT 

In the preceding sections we have presented protocols 
to optimally estimate quantum finite mixtures and have 
obtained bounds on their accuracy using a covariance- 
type error matrix as a cost functions. We have also 
identified situations where these bounds are attainable 
and provided the corresponding optimal measurements; 
all this, in the framework of single- and multiple-copy 
estimation. In this section, we focus on the latter, in 
the asymptotic limit when a large number N of copies 
is available for the experiment. Although the approach 
of the preceding sections can be carried out also in this 
case, asymptotic expansions become involved, with a few 
exceptions where a closed expression can be found for ar- 
bitrary N [see, e.g., Eq. d^^ ]. Our aim here is to provide 
more straightforward means to obtain asymptotically op- 
timal estimation protocols in general situations and com- 
pute the corresponding MSE. For this, we can resort on 
the well known Cramer-Rao (CR) theory and its quan- 
tum extension, which we briefly discuss next, particular- 
ized to finite mixture estimation. A very powerful result, 
known as Holevo bound, will be also presented in the 
next section along with a simple example of use. A more 
detailed and comprehensive presentation, which includes 
a discussion on the relationship between this theory and 
the Bayesian approach of the preceding sections, can be 
found in [13]. 

In this framework, to which we will refer as 'point- 
wise', one focusses on a fix point in parameter space, i.e., 
the unit simplex in our case, and restrict oneself to con- 
sider Locally Unbiassed (LU) estimators: those for which 
{^x)>^ = A in some open set, where, in the same spirit 
of previous notation, ( • )a indicates averaging over the 



conditional probability p(x|A) at the fixed point A. We 
define the error matrix A (A) as 

A(A) = ((Ax-A)(A^-A)*)x 

= ^P(X|A)(A-AJ(A-AJ*. (27) 

X 

It depends on the measurement and on the estimator, 
i.e., on the particular way one associates A^ to a given 
outcome x of the chosen measurement. For the sake of 
simplicity in most of this section we will assume that 
the mixtures are identifiable. The problem of dealing 
with unidentifiable mixtures will be postponed to the last 
subsection (|IIICp . 

A. The Cramer Rao Bound 

A first important result of the theory is the so-called 
CR bound 0, HJ]. It states that the error matrix of 
a LU estimator at A is lower bounded by the inverse 
of the Fisher Information defined in with p\{x) — 
p(x|A) = XtE^^x p\ [note that in this theory the POVM 
may depend on the vector A; see the comments after 
Eq. (1201)], namely, 

A(A)>F-i(A). (28) 

Assume now that the same measurement is performed on 
several independent copies; i.e., on the average state ■ 
Due to the additivity of the Fisher Information, in this 
multiple-copy scenario one has 

A(A) > (29) 

where the subscript 1 refers to the one-copy model p\. 
This inequality expresses the fact that the MSE of the es- 
timation scales with the inverse of the number of copies, 
and the accuracy by which we are able to estimate A 
with just a copy sets the scale. It is well-known that un- 
der some regularity conditions the maximum likelihood 
estimator achieves the CR bound asymptotically. 

In spite of its fundamental character, the CR bound 
has the drawback that the bound it provides refers to 
a particular measurement, not necessarily optimal. To 
go around this difficulty, we invoke the Braunstein and 
Caves inequality, already discussed in the Sec. Ill A[ and 
obtain 

A(A) > (30) 

Recall, however, that this bound is not always attainable 
but, when it is, the projectors onto the eigenspaces of the 
SLD Lr-(A) define the optimal measurement. It is impor- 
tant to point out here that practical use of this approach 
requires a two-step measurement in order to saturate the 
bound. This is necessary because this optimal measure- 
ment, and thus the estimator, depend themselves on A, 
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which we do not know beforehand. To overcome this dif- 
ficulty, one can take an asymptotically vanishing fraction 
of copies, say y/N , and make an initial estimate of the 
weights Aini. Then, on the remaining copies one can per- 
form the measurement that is optimal at Aini, i-e., project 
on the eigenspaces of L,.(Aini) (see Appendix [O for an 
explicit example of this procedure). Thus, this two-step 
adaptive measurement, which is independent of A, ap- 
proaches the optimal one in the asymptotic limit at lead- 
ing order in 1/iV, and one may write 

A = J dX t:{\)A{X) + o (N^^) . (31) 

This equation establishes a bridge between the asymp- 
totic pointwise theory of this section and the Bayesian 
approach discussed in the first part of this paper. With 
all this in mind, we conclude that for sufficiently smooth 
priors 7r(A) it holds that 

A > 1 1 rfA 7r{X)H^\\) + o . (32) 

So far in this section we have overlooked the fact that 
not all the components of A are independent, as A must 
lie on the unit simplex. One could circumvent this by 
simply using the constraint J^r Ar = 1 to write a partic- 
ular component, say Am, m terms of the remaining M —1 
as Am = ^ — ^fLi^ -^r- This possibility, however, intro- 
duces a huge asymmetry in the calculation which may 
result in difficulties to invert the Fisher Information ma- 
trix Hi and compute the bound ([5^ . Note that in- 
side the unit simplex the variations of A are constrained 
by AA • It = 0, where u = (1, 1, . . . , 1). A fully sym- 
metric way of dealing with this issue is to project the 
information matrices F and H onto the orthogonal com- 
plement of span{tt}, which we call S. Thus, the CR 
bound, Eq. p9|. takes the form j25i] 

PsA(A)P5 > ^ [PsFi{X)Ps]-' , (33) 

and similarly for its quantum version in Eq. (1301) . 
where Ps stands for the projector on S and the in- 
verse, [ • ]^^, is restricted to the support of Ps- 

As an example, let us consider again the mixture of M 
orthogonal states and compute the asymptotic expres- 
sion of Ej^, introduced in p6)) . Applying the definition 
of SLD in Eq. (|16p to the 1-copy family px it is straight- 
forward to obtain that Lr{X) = PrjXr-, where Pr is the 
projector onto the support of p^. Applying now the def- 
inition of the QFI, Ea. (|T5|) . to the same family we ob- 
tain [i?i(A)]rs = Srs/^r- For brevity, we omit the argu- 
ments and write for the projection of Hi onto S*, i.e., 
H'l = PsHiPs, and similarly for other matrices. Let 
us start by computing det H'l (here the zero eigenvalue 
corresponding to the kernel of the projection is, of course, 
removed from det). Since i) the determinant of a d x d 
matrix is a homogeneous polynomial of degree d in its 



matrix elements and ii) the vector u has the same pro- 
jection on each eigenspace of Hi, it follows that i) det H[ 
must also be a homogeneous polynomial of degree M — 1 
in 1/Ar, i.e., in the eigenvalues of Hi, and ii) it must 
be a symmetric function of these eigenvalues. We also 
note that det H[ must vanish if any two or more of these 
eigenvalues are set equal to zero, since in this case S nec- 
essarily contains a null subspace of Hi [in doing so, the 
condition A^ < 1 is temporarily lifted, which is legiti- 
mate, since the result we are after, Eq. (IMl) below, is an 
algebraic relation that holds for generic {A^} regardless 
whether they are probabilities or not]. Hence, 

detff; = i-y TT -^ = i-TTl, (34) 

{ri} i=l • i=l 

where the sum extends to all subsets of M — 1 indexes 
drawn from 1,2,..., M, and the prefactor 1/M can be 
easily computed by considering the particular case where 
all Xr are equal. Reasoning along the same lines, we 
conclude that 

M-2 

detH[tr{H[)-'^-Y. 11 " ' (35) 

[Note that on the left hand side of this last equation 
detH[ [{H[y^]st is the {s.t) cofactor of H[, i.e., the 
signed determinant of the matrix H'l with row s and 
column t removed. It follows that det H'l tr{H[)^^ is a 
homogeneous and symmetric polynomial in l/A^ of de- 
gree M — 2 that vanishes if three or more eigenvalues 
of Hi are set equal to zero.] Combining Eq. ([35]) with 
Eq. ([M]) . and after some algebra, we obtain 

tr(//;)-i = (y. ^r) E = 1 - E (36) 

\ r / r^s r 

The averaging over the flat prior can be easily performed 
with the help of Appendix IB 11 obtaining 

J dXni,,,tr{H[)-' = ^^. (37) 

Taking into account ([5^ and = trA = trA' 

up to o{N-^), we finally find that Ejj ^ (M - 1)/[(M + 
l)N]+o{N~^), which indeed agrees with (l26l) for large N. 

B. Holevo bound 

The quantum CR bound is a matrix inequality which 
is in general non- attainable [a few remarkable exceptions 
are those discussed in Sec. Ill Al in the paragraph after 
Eq. ([TT]) . and the example above]. However there is a re- 
lated bound that one can expect to be saturated asymp- 
totically: the Holevo bound. Indeed for qubit systems 
asymptotic attainability has been proved by Hayashi and 
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Matsumoto in [26j and the general proof for finite di- 
mensional systems follows from a recent paper by Kahn 
and Gu^a [23]. We note that attainability here, as in the 
CR bound, is proven in a pointwise approach and hence 
makes implicit use of the two step adaptive measurement 
that we mentioned above. An important difference here 
is that at the second step the measurement attaining the 
Holevo bound will in general be a collective measurement 
that can not be implemented by local measurements on 
each copy. 

Let us briefly introduce the Holevo bound for quan- 
tum finite mixture estimation (see also [IS])- Let G be a 
positive semi-definite matrix and 



mm 



})} 



trGA(A), 



(38) 



where the minimization is over all pairs {{E^}, {A^}) of 
measurements on pf^^ and estimators for which the lat- 
ter is LU at A (the unbiasedness of an estimator depends 
on the measurement through its outcome probability dis- 
tribution). Eq. (p8| is relevant to the problem we are 
deahng with because its right hand side gives, e.g., the 
smallest MSE, tr A(A), if G = 1. I.e., (t) is the MSB 
of the optimal A^-copy estimation scheme. 

In Ref. ^6] Holevo proved the following bound: 



where 



Ci(G)>Gf(G), 



Gf(G) = mill \tTG^Z[X] 



(39) 



tr VG^Z[X]VG 



(40) 



In this expression X = (Xi, X2, ■ ■ ■ , Xm-i) are her- 
mitian matrices, one for each independent parameter 
(Thus, for quantum mixtures, we will choose Xm — 



1 — J2r^ ^ ^r), satisfying the following relations 



tipxX = 0, 

tidrPxXs = Srs, 



1 < r,s < M - 1. 



(41) 
(42) 



The minimization in ()40|) is over the set Sa of all such X . 
Finally, Z[X] is the matrix whose elements are given by 



Zrs[X] = tipxXrXs; l<r,s<M -1. 



(43) 



Although the Holevo bound ([M]) is not in general at- 
tainable, it is attainable for the class of Gaussian mod- 
els. The recent work j27j on asymptotic normality shows 
the asymptotic (local) equivalence between the many- 
copy states of finite-dimensional systems and a Gaussian 
model and thereby proves the asymptotic attainability of 
the Holevo bound for finite dimensional systems, i.e.. 



N 



hm NC^{G)^C^{G), 



(44) 



To relate the above with the Bayesian approach of the 
preceding sections, we need to average over 7r(A): asymp- 
totically, we have trGA = A^^^ / dA 7r( A) Gf(G) + 
o{N~^). Thus, for instance, the MSE E can be com- 
puted as 



E = tTA = 



1 1 d\ni\)C^{l)+o{N'^) 



(45) 



As to whether or not this averaging is legitimate and the 
resulting bound on the averaged cost function is attain- 
able, there exist very good heuristic arguments, as well 
as various examples [22, that this should be the case, 
but no rigorous proof. Thus, this last equation should be 
taken with a grain of salt. 

To illustrate the use of the Holevo bound in finite mix- 
ture estimation, let us assume that pr, 1 < r < A are 
four pure qubit states whose Bloch vectors fir form the 
vertices of a regular tetrahedron: 



ni 



"3 




"2 



n4 




(46) 



With this, the Bloch vector of the finite mixture is fx = 
714 + Ylr=i'^r{nr — ^4). With full generality we may 
write Xr — ar + br ■ (f, where a = {(Jx, Cy, az) arc the 
standard Pauli matrixes. Conditions (|4T]) and p2|) are 
equivalent to = —br ■ fx and (n^ — fti) ■ — 6rs, 
1 < r, s < 3. This very last equation can be inverted 
(this will be always the case if the mixture is identifiable) 
and we obtain br = (3/4)??^, 1 < r, s < 3. With this. 



Xr 



1 — 4Ar + Slir ■ S 



1,2,3. 



(47) 



We see that ([41]) and (|42|) determine X uniquely and 
no minimization is required in (|40p . A straightforward 
calculation leads to 



{^Z{X\)r 

{'^Z{X\\ 
(5Z[X]), 



(1 + 2A^)(1-A^) 
2 ' 
3+(l-4A^)(l-4As) 



V3 



16 

erst(A4 — A(); 



(48) 

, r ^ s; (49) 
(50) 



where trst is the (fully antisymmetric) Levi-Civita tensor 
in three dimensions. To compute the MSE we need the 
following 



tr5RZ[X] 
tr 



1 
2 

V3 
2 



3 + ^A,.(l-2Ar 



E(^4-A.)^ 



1/2 



(51) 
(52) 
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and, averaging over 7rflat(A), Eq. ([45 
1 /63 



E = 



N 



(g + 0.43)+o(iV-)=^ + o(A^-) (53) 



where the first (second) figure in the parenthesis comes 
from the real (imaginary) part of Z[X] in Eq. ([5T|) 
[Eq. (|5^ ]. It is interesting to note that for this example, 
the quantum CR bound is not attainable. Indeed, one 
can check that the SLD Lr (A) is given by 



Lr{X) 



nr ■ rx 



TT-r ■ rx 
IfAP - 1 



(54) 



where rx — X]r=i ^rrir is the Bloch vector of the aver- 
aged state px and 1 < r < 4 (we now treat all components 
of A as independent, in accordance with the approach de- 
veloped in Sec. IIII A[) . One can immediately check that 
the commutator of the SLDs does not vanish, and the 
quantum CR bound is not saturated. Just for the sake 
of completeness, the quantum Fisher Information matrix 
is given by 



(-ffl)rs 



{fir ■ rx){ns ■ rx) 
1 - \rx\^ ' 



(55) 



for 1 < r < 4. Projecting on S with 




(56) 



and (pseudo)-inverting, one obtains the relation 

H^^^nz[X]. (57) 

After averaging, we observe from (|53p that the quan- 
tum CR bound 



E > 



cannot be saturated. 



63 
4(W 



o{N-^) 



(58) 



C. Unidentifiable mixtures in the asymptotic limit 

In the preceding section, we were required to assume 
that estimation errors become vanishingly small as the 
number of copies increases. This assumption does not 
necessarily hold if mixtures are unidentifiable. In or- 
der to be able to apply the asymptotic techniques intro- 
duced above, we make a useful observation. If a quan- 
tum finite mixture is unidentifiable there necessarily ex- 
ists an orthogonal transformation A' = OA such that the 
states Px depend solely on a reduced number of param- 
eters {^r = K}r^=n '^itl^ < independent 
of the redundant parameters {rj^ = AJ,}^£„_|_]^. The er- 
ror matrix A of the original parameters A is, of course. 



related to the error matrix A' of the new ones, ^, r/, by 
the similarity transformation A' — (DAO*. Any mea- 
surement performed on the state px will only give infor- 
mation about the parameters ^, whereas the components 
of T] have to be guessed independently of the measure- 
ment outcomes (e.g., by random choice). The optimal 
choice for rj is, actually, (r/), and leads to an error that 
is, of course, independent of the number of copies. This 
means that in unidentifiable quantum mixture estima- 
tion there will always be an intrinsic error associated to 
the uncertainty in the redundant parameters ry, which 
remains constant regardless of the number of copies one 
is provided with. In the asymptotic limit, one can apply 
the bounds of the preceding sections to the block of A' 
corresponding to the relevant components ^. 

To illustrate this let us consider the unidentifiable 
qubit mixture defined by 

PX = Ai|0)(0| + A2|l)(l| + A3|+)(+| + A4|-)(-| 

= ^ [a + (Ai - A2)a, + (A3 - A4)a.] , (59) 

where |±) — (|0) ± |l))/-\/2. If we perform the following 
rotation O in parameter space 



(60) 







/I 


-1 





6 


1 








1 




"71 


1 


1 





\V2J 


Vo 





1 




we have 



px 



(61) 



This shows that rji and 772 are redundant parameters, 
and measurements will give us no information about 
them. For the simple model in it is straight- 

forward to obtain the Holevo bound. Wc first check 
that X = {Xi,X2y, with 



A2 = -^-{2l. (62) 



is the solution to conditions (|4T|) and (|42| that mini- 
mizes (l40l). It follows that 



and gives 



^Z[X]=0, (63) 



cf(i) = i-e?-el- 



(64) 



In the limit iV — )• 00, we can compute the error coming 
from the estimation of ^ through (|l5)) . i.e., 

2 

r=l 
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which is asymptoticahy vanishing. As for the estimation 
of rji and r]2 , we make the optimal guess 



iVr) = / dXTTfi^tWVrW 



1 



2V2' 



(66) 



thus 



2 2 
^ A(7)= /'dA^flat(A)^ MX) - {Vr)f = ^ (67) 

[according to the notation introduced in the para- 
graph below pUj) . this quantity could also be denoted 

byELiAl?^^ 

Putting all pieces together, the estimation error is 



r=l 



1 



20 107V 



(68) 



In conclusion, this explicit example shows that uniden- 
tifiable mixtures will lead to a non- vanishing estimation 
error even in the asymptotic limit. 



IV. ESTIMATION OF TWO-COMPONENT 
MIXTURES 

In this section we dwell on the simplest quantum mix- 
ture scenario, where the average state px belongs to the 
1-simplex 



PA = Api + (l-A)p2, 0<A<1 



(69) 



(hence Ai = A, A2 = 1 — A). Although the error ma- 
trix is 2 X 2, only one of its entries, say An, contains 
independent information about the accuracy in the esti- 
mation of the mixture (j69p . Therefore, in the following 
we simply drop the remaining three entries and write A 
[and likewise for the Fisher information matrix F{X), 
the quantum Fisher H{X), etc., to which we will refer 
as F{X), H{X), etc.]. Since there is only an indepen- 
dent parameter, we may also drop the vector notation 
and write A instead of A. 



A. Single-shot estimation 

The single-copy version of this problem was considered 
recently in [ISj], though the optimal measurements and 
minimal estimation error were only determined when pi 
and p2 are qubit and/or pure states. Our results in Sec.lTTl 
show that for the two-component mixture in ()69|) . the 
attainability conditions are fulfilled for any pi and p2, 
and the optimal measurements, along with their mini- 
mal estimation error, can always be determined in both 
single- and multiple-copy scenarios. In particular, it fol- 
lows from our results that the optimal protocol consists 



of a projective measurement, where the projectors are 
those onto the eigenspaces of the SLD L{X). Our results 
in the present paper thus provide answers to various open 
questions posed in [iSj. 

Let us focus first on the single-copy estimation. 
By choosing the optimal estimator ([6]), the MSE is given 
by (fT9|) . For the mixture ([69]) this equation can be cast 
as 



A = A - 2A 



Tim 



|(0m|Pl - P2\(l)n)\" 



(70) 



[In the case under consideration here, A^i = (AAq) — 
(A)(A„). Thus, All = (A2) - (A)2 = A, and A21 = (A(l - 
A)) - (A)(l - A) = -A.] The bound ^ is attained with 
the measurement characterized by the eigenprojectors of 
the SLD [see Eq. ^] 



L(A) = 2A 



\P1 - P2\(l)n) 



nm 



\4>n){4'n 



(71) 



where we recall that {|<^ri)} (t'n) are the eigenvectors 
(eigenvalues) of {p\). One can readily check by explicitly 
solving (fTH]) that for a uniform prior 7rflat(A) = 1, and for 
pure (or for qubit states with the same purity) pi and P2 
we have L{X — 1/2) cx (pi — P2), in agreement with '18*. 
Accordingly, in this situation A = [2 + tr(pip2)]/36. 



B. Multiple-copy estimation and the asymptotic 
limit 



Although a straightforward exercise, computing A 
for A^ > 1 copies of p\ is a tedious task even for two- 
component mixtures. In most cases, the resulting expres- 
sions cannot be written in closed form for arbitrary N 
and are thus not very revealing. So, rather than at- 
tempting to present a general case, we have selected a 
particular example, which we will later use to illustrate 
the connection between the Bayesian and the asymptotic 
pointwise approaches. 

Assume pi and p2 are commuting non-orthogonal qubit 
states. Let us further assume that p2 is pure and that 
the prior is flat. Then, we can choose basis so that 



Pa = A 



1-e 
e 



+ (1-A) 



1 




(72) 



Proceeding as in Sec. IIIBl the A^-copy state pf^ can be 
cast in the form ^ with 

CkiX) = (J^ (Ae)'=(l - Ae)^'^ < fc < A^, (73) 

and pk = 5[|1)(1|^'= (g) |0)(0|®(^-'=)]. Hence, using 
Eq. (jl9p we have that the minimum error is given by 
[recall that we are assuming the flat prior 7rfiat(A) — 1] 



A'^ 



12 2 ^ 



J2k BkPk \(j)n 



(74) 
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where 



Sfe = 2Afc = 2[(Acfe(A))-(A)(cfe(A))] 



N 



dX {2X-l){eX)''{l-eX) 



N-k 



(75) 



and where now 

{|0„)} = perms ||1)®'= ® |o)«(^-fc)|'^ (76) 
L J fc=o 



are the 2 eigenvectors of {pf ) (pernis{ • } stands for 
the set of distinct permutations of the set { • }). Defining 



A, 



N 

k J 70 



dX {eXf{\ - eA) 



Af-fe 



(77) 



the eigenvalues of (pf ) are z^fe — A/j/ ( ^ ) , and have 
multiphcity (^) . Therefore, 



12 4^Afe' 



(78) 



As shown in Appendix IB 3[ the terms of the sum above 
can be written as ratios of Regularized Incomplete Beta 
Functions thus providing a more compact expression for 
the error. However, we can only give a closed form for A 
in the asymptotic limit of very large number of copies. 
This requires evaluating the sum in (j78l) up to order 1/iV: 



+ o(7V-i) 



(79) 



(details of this evaluation are also given in Appendix |B|) . 
Plugging this expression into (|78t we obtain 



With the asymptotic techniques introduced in 
Sec. nil Al the previous evaluation can be simplified a 
great deal. Moreover, these techniques enable us to give 
closed-form expressions of A for rather more general two- 
component mixtures. As already mentioned, the attain- 
ability of the CR bound is guaranteed for these (one- 
parameter) mixtures and its application is particularly 
simple. From our discussion in Sec. IIII Al Eq. ((32)) . we 
can write 



A = 



1 

TV 



dA7r(A)iJj"^(A) + o(l/iV), (81) 



where we recall that Hi is the QFI of the 1-copy 
model ([M)) . As it can be simply read off from (fTO)) . 



i/i(A)=2^ 



|(0m|Pl - P2|0n)|' 



(82) 



Note, however, that {It/)™)} (vn) are now the eigenvectors 
(eigenvalues) of p\, rather than of (pa), and the QFI is 



thus a function of A. In the Bloch representation we can 
write 



Pt 



1(. 



1,2, 



(83) 



which holds when pi and pi are both qubit states, but 
also when they are pure states in arbitrary dimensions. 
Since in these cases the two density matrices can be taken 
to be real, it suffices to consider a — [p^ ,cFz)- The eigen- 
values and eigenvectors of px can be written as 



I0±)(0i 



:± 



l±rx 



(84) 



where, as in previous examples, f\ = Xfi + (1 — A)r2 is 
the Bloch vector of pA, and we have defined r\ = \f\\. 
After some algebra one finds 



HiiX) = In - r-2\ 



,-tl2 



(85) 



For pure states, pi = \ipi){ipi\ and p2 = \'P2){'f2\ 
(i.e., T"! = r2 = 1) one can further simplify this expression 
and write 



H^^iX) 



A(l-A) 



(86) 



If the prior is assumed to be flat, 7rflat(A) = 1, a trivial 
integration leads to 



^pmc 



1 



67V(l-|((pi|(p2)|^ 



o{l/N). (87) 



If pi and p2 are not pure, the Bloch representation i 
holds only for qubit states. Assuming the flat prior, after 
a lengthy calculation one flnds (to leading order in 1 /N) 



Al 



ubit 



1 



6 - Irl + ri\ 



6N \ fl - r^P - rjrj + {fl ■ f^y 
1 3 — trp^ — trp2 — trpip2 



67V trpj + tipl - trp J trp^ - (2 - trpi P2 ) trpi p2 ' 

Recall that for the cases at hand there exist adaptive 
measurement that attain the above bound. The reader 
is referred to Appendix ICl for a specific illustration of this 
general result. 

Before ending this section, we come back to the two 
commuting states example in Eq. (|72p . for which the es- 
timation error, Eq. (j80l) . was worked out entirely in the 
Bayesian framework and the limit iV was taken after- 
wards. The same estimation error can be obtained ap- 
plying the pointwise CR result ([M]) . It is straightforward 
to check that this much less costly procedure leads to the 
same result (|80|. as it should. Recall, however, that it 
leads to sensible results only if the number N of copies is 
exceedingly large, whereas the Bayesian approach works 
for any N. 
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V. CONCLUSIONS 

Quantum ensembles embody what in classical statis- 
tics is known as finite mixtures, and can thus be viewed 
as their quantum counterpart. More precisely, we have 
a quantum finite mixture whenever a signal can be char- 
acterized by a density matrix that is the average of a set 
of known states (pure or mixed), as is often the case in 
quantum communication. In these situations, one wishes 
to find the probability law that best describes the signal, 
or in other words, the weights that define the quantum 
ensemble. This has been the subject of the present paper, 
where we have relied on quantum estimation theory, but 
also broadened the field by proposing new applications 
and tools. 

The topics addressed in this paper include: the precise 
definition of quantum finite mixtures, as an extension of 
finite mixtures to the quantum domain; optimal estima- 
tion (of their weights) when a given number of copies of 
the average state is available for measurement; optimal 
estimation in the asymptotic regime of large number of 
copies; and characterization of the (un)identifiability of 
quantum mixtures. For each of these topics we have an- 
swered the relevant questions and provided useful results, 
of which we also give some examples of application. 

Going into more detail, we have approached optimal- 
ity from both the Bayesian and the 'pointwise' points 
of view. In the former, one minimizes an averaged cost 
function, which we have chosen to be the covariance-type 
error matrix of the estimation, over a joint probability in- 
volving the measurement outcomes as well as the prior 
knowledge of the weights. Our key result is A = A — 
[see Eq. (fT2|)]. It states that the error matrix is the in- 
trinsic uncertainty of the weights minus the Fisher infor- 
mation matrix, which quantifies the information gained 
in the measurement process. This exact relation, valid 
for any number of copies, is linear in the Fisher infor- 
mation matrix, in contrast to the Cramer-Rao bound, 
where the error is lower bounded by the inverse of the 
Fisher information matrix. From our relation one ob- 
tains a measurement independent lower bound on the 
error matrix in terms of the Quantum Fisher Informa- 
tion. In those cases where the Braunstein-Caves inequal- 
ity (which states that the Fisher Information matrix is 
upper bounded by the Quantum Fisher Information) is 
saturated our bound is attainable for any number of 
copies. When this holds (e.g., two-component mixtures), 
we give the optimal measurement protocol, which turns 
out to be of von Neumann type. 

As to the pointwise approach to quantum mixture esti- 
mation, we have briefly introduced the Quantum Cramer- 
Rao and the Holevo bounds in the specific context at 
hand. We have next applied these tools to obtain lower 
bounds for the error matrix of the weights when the 
number of copies of the average state is asymptotically 
large. In those situations, the Bayesian approach be- 
comes rather involved and it is advisable to switch to the 
tools under discussion. Although the Quantum Cramer- 



Rao and the Holevo bounds can be applied to unidentifi- 
able mixtures, its use requires some technicalities that we 
have commented upon and illustrated with an example. 
As one would expect, the accuracy of the weight estima- 
tion for such mixtures does not vanish even if an infinite 
number of copies were available. A discussion on the rela- 
tionship between the Bayesian and pointwise approaches 
has been also given, as well as an example illustrating 
that the two approaches give consistent results. 

Among the examples one can find in this paper, we 
would like to highlight that of a mixture of a number 
of orthogonal states, which is relevant in the context of 
channel estimation. For this problem, and assuming a 
flat prior distribution of weights we have been able to 
write the minimal square error in a closed form, valid 
for any number of orthogonal states and any number of 
copies of the average state. 

This paper is mostly devoted to the formalism and 
general results concerning quantum finite mixtures and 
the estimation of their weights. The examples are chosen 
for the sake of illustration, rather than for their prac- 
tical relevance. As mentioned in the introduction, real 
applications of our work are, e.g., the characterization 
of signals in relevant quantum communication problems 
and the estimation of probabilities with which various 
errors occur in a given channel. We have shown that in 
some instances the bounds we give are attainable by lo- 
cal two-step adaptive measurements. It remains an open 
question to establish whether or not collective measure- 
ments are necessary in the general case. Future exten- 
sions of our work also include the estimation of mixtures 
of continuous variable systems. 
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Appendix A: The <ta are physical states 

It is clear from its definition in dTTI) that trcx = 1. 
So a\ is a proper density matrix if (7^ > 0. To prove 
this inequality, we take any state and define = 
{ij\px\ip) > 0. Recalling Eqs. © and we see that 

the relation {iplaxlijj) > is equivalent to 



(p. 



, (Pt) 



-{Xr) 



>0. (Al) 



= 1.1 



[Note that (XrPp/ipt) > and EA^rpt) / (pt) 
But (jAl|) immediately follows from the inequality 

{x -z)-{y-z)> -1/2, (A2) 
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where x = {xr}, y = {Ur} and z = {zr} stand for any 
three probabihty vectors. 

For the sake of completeness, we also prove (|A2|) . We 
just need to notice that {x — z) ■ {y — z) as a function 
of z has a minimum at Zq = {x + y)/2. For any x, y 
and z we can thus write 

{x-z)-{y~z) > {x - zo) ■ {y - Zq) 

\x-y? 1 , 



which is the inequality (jA2 



f{ki), to the last sum in (jB4p . The number of such vec- 
tors follows from (jB3p by simply making the substitu- 
tions M M -1 and N ^ N - ki. Hence, 



N 



T.m-)-MY.(''+Z''r']f(h)- (B5) 



k,r 



ki=0 



M -2 



For the particular case we need in Sec. Ill B[ f{x) = 
and the corresponding sum gives 



2N + M-1 {M + N-l)\ 
M + 1 {M -1)\{N - 1)!' 



(B6) 



Appendix B: Useful formulae 
1. Averages with a flat prior 

Recall our notation: dX — 5 — 1) Hr 

Then, one can prove the following useful result 



(M-l-f E.fcr)!' 



(Bl) 



where the integration is restricted to positive values of Ar, 
r = 1, . . . , M . Although we use this integral for kj. be- 
ing positive integers, the result can be generalized to 
complex kj. by simply replacing the factorials by Euler 
Gamma functions: /c! — > r(A; -I- 1). 

In particular, Eq. (jBl|) and the normalization condition 
J d\'K{X) = 1 imply that the flat distribution is given by 



7rflat(A) = (M-l)!, 



(B2) 



Sums 



Some results in Sec. Ill Bl require computing sums of 
the form ^ f{kr), where kr are the components of the 
vector k. They are positive integers that add up to N, 
and the sum extends over all the 



M -I- iV - 1 
M- 1 



(B3) 



such vectors. To compute this sums, we first note that 
E/(fc'-) = E(E/(^'-)j =MY,fiki), (B4) 



fc,r 



r— 1 \ k 



k 



where we have used that the sum in parenthesis is inde- 
pendent of r. This is so because the set of all vectors k 
is invariant under kr — >■ fco.(r-), where a is any permuta- 
tion of the symmetric group Sm, and thus J2k fi^r) = 
Sfe f{f^cr{r))- We next note that any vector k whose first 
component is fixed to be fci gives the same contribution. 



3. Evaluation of the sum (|79|) 

Recalling the definitions ((75|) and ((77|) . and after some 
algebra, we have 



^ 1 4 ^/fc + iy /,^(fc+2,fc+i) 

" le^f^^\N + 2) /,(fc + l,fc+l) 



iV- 
_ Rie) 



1, 



(B7) 



where k = N — k and Ix{a, b) is the Regularized Incom- 
plete Beta Function, 



Ix{a,h) 



Bx{a, 



1 



B{a,h) Bia,b)Jo 
To obtain (IB7I) we have also used that 



dtf-'il-t)"-'. (B8) 



N 



Y,Ie{k+l,k + l) = {N + l)e 



(B9) 



fc=0 



and 



N 



E ^^^(^ -f 1, fc -t- 1) - (iV + 1)^, (BIO) 



which both follow immediately from the definition 
in Eq. dBS]). Recah also that Bi{a,b) = B{a,b), 
where B{a,b) is the standard (complete) Beta Function, 
B{a,b) = r(a)r(6)/r(a + b). According to the Euler- 
MacLaurin formula, the sum in (jB7p can be approxi- 
mated by an integral which, after differentiating with 
respect to e, can be cast as 

, 4iV f\ fNx + lYl,{Nx + 2,Nx + l) 
R (e) — — I dx ' ' 



N + 1Jq \N + 2J I,{Nx + l,Nx + l) 
2{N + 2)e I,{Nx + 2,Nx + l)' 



Nx + 1 Ie{Nx + l,Nx + l) 



B{Nx + l,Nx + lY 



(Bll) 
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where x = I — x. The last factor peaks at x = e as iV 
becomes large and can be replaced by the Gaussian 



' ■ cxp < —N- 



N y 27re(l - e) 



2e(l - e) 



Since we are interested only in terms that vanish asymp- 
totically as N~^, we can drop those that vanish exponen- 
tially, and approximate Eq. (|Blip by 

^ ' J^^ \N + 2 J \ Nx+l j 



I N 
27re(l - e) 



exp <^ -N 



2e(l-e) 



(B12) 



For the same reason, we can expand the first line in ()B12p 
up to first order in = (x — e)^ and write 



/oo 
-OO 



X 1 / - — r exp <^ ~N- 



27re(l - e) ^ { 2e(l - e) 
The remaining integral gives 



R'{e) = 4e2 - _e(l - e). 



(B13) 



(B14) 



Hence, 



R{e) = R{0)+ / dsR'is) 
Jo 



from which the final result follows. 



Let us assume that we are given N copies of the 
state p\. On a first stage of the protocol, we take VlV of 
these copies and perform on each of them a same mea- 
surement, with the aim of obtaining an initial, rough esti- 
mate of A, which we denote by Xmi- Since these measure- 
ments use uncorrelated copies and are themselves inde- 
pendent, we expect to benefit from the well understood 
statistical improvement that results from averaging over 
the \/N samples. Thus, we can assume that, in average. 



(C2) 



(A - XiniV - 



where a is some constant whose value depends on the 
precise measurement that we perform. 

On a second stage, we refine the rough estimation ob- 
tained in the preceding stage by performing a (nearly 
optimal) measurement on the remaining N — ^/N copies. 
As discussed in Sec. lIII Al fSee also Sec. Ill Ap . the optimal 
measurement is described by the set of projector, {P^(A)} 
(it is a von Neumann measurement), onto the different 
eigenspaces of the SLD, L{X), of our model evaluated 
at A. For our example, one can readily find that 



L{\) = 



(1 - 2A)(1 + (Tz cosO) +cr^ sin 6* 
2A(1 - A) ■ 



(C3) 



However, since we do not know the true value of A, we 
choose the measurement to be given by {P;^(Aini)}, and 
hope this change will not affect optimality. Let us check 
that this is indeed the case. To this end, we diagonal- 
ize (jC3p . obtain {P^(Aini)} and, in turn, compute its 
Fisher information defined in (1131). We obtain 



A(l - A) + (Aini - A)2 cos2 



(C4) 



Appendix C: Two-step adaptive measurement in the 
asymptotic limit 

In this appendix we give an explicit example of the 
two-step adaptive measurement protocol that attains the 
Cramer-Rao bound asymptotically [see Sec. IIII Al the 
paragraph after Eq. (PH)) ]. To ease the calculation we 
choose the simplest instance: that of a mixture of two 
pure states, p\ — Xpi + (1 — X)p2- This mixture has 
been already considered in Sec. IIVI in the paragraph 
after Eq. ([55)1 . Here we stick to the same notation. 
If pr (r = 1,2) are pure, without loss of generality they 
can be chosen to be 

Pr - i [l + a,cos0+i-lY+^a,sin0)] (CI) 

(as if they were qubit states on the equator of the Bloch 
sphere), where cos 6* = VtrpiP2 ~ \{'Pi\V2)\ is the over- 
lap. 



(recall that the subscript 1 refers to one copy). Thus, the 
error of performing this measurement on the TV — y/N 
copies is 

For sufficiently large N (so that -v/ZV itself is also very 
large), Eq. (jC2l) holds in average, and 

A(l-A)csc2 6i a cot^ 61 

^ '~ N{1- Af-i/2) + Ar3/2(i _ ^-1/2) 

^^^^^|^ + 0(iV-/2), (C6) 

thus attaining the optimal bound, as can be read off 
from dMl)- 
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